A quality index for decision tree pruning

نویسندگان

  • Dominique Fournier
  • Bruno Crémilleux
چکیده

Decision tree is a divide and conquer classification method used in machine learning. Most pruning methods for decision trees minimize a classification error rate. In uncertain domains, some sub-trees which do not decrease the error rate can be relevant to point out some populations of specific interest or to give a representation of a large data file. We present here a new pruning method (called DI pruning). It takes into account the complexity of sub-trees and is able to keep sub-trees with leaves yielding to determine relevant decision rules, although they do not increase the classification efficiency. DI pruning allows to assess the quality of the data used for the knowledge discovery task. In practice, this method is implemented in the UnDeT software.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of liquefaction potential based on CPT results using C4.5 decision tree

The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...

متن کامل

Anomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors

Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...

متن کامل

A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization

In this work, we present a new bottom-up algorithm for decision tree pruning that is very e cient (requiring only a single pass through the given tree), and prove a strong performance guarantee for the generalization error of the resulting pruned tree. We work in the typical setting in which the given tree T may have been derived from the given training sample S, and thus may badly over t S. In...

متن کامل

Impact of learning set quality and size on decision tree performances

The quality of a decision tree is usually evaluated through its complexity and its generalization accuracy. Tree-simpliÞcation procedures aim at optimizing these two performance criteria. Among them, data reduction techniques differ from pruning by their simpliÞcation strategy. Actually, while pruning algorithms directly control tree size to combat the overÞtting problem, data reduction techniq...

متن کامل

A Fast , Bottom - Up Decision

In this work, we present a new bottom-up algorithm for decision tree pruning that is very eecient (requiring only a single pass through the given tree), and prove a strong performance guarantee for the generalization error of the resulting pruned tree. We work in the typical setting in which the given tree T may have been derived from the given training sample S, and thus may badly overrt S. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Knowl.-Based Syst.

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2002